20 research outputs found

    Species-level functional profiling of metagenomes and metatranscriptomes.

    Get PDF
    Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

    The sequences of 150,119 genomes in the UK Biobank

    Get PDF
    Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data(1,2). Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank(3). This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation

    A new MRI rating scale for progressive supranuclear palsy and multiple system atrophy: validity and reliability

    Get PDF
    AIM To evaluate a standardised MRI acquisition protocol and a new image rating scale for disease severity in patients with progressive supranuclear palsy (PSP) and multiple systems atrophy (MSA) in a large multicentre study. METHODS The MRI protocol consisted of two-dimensional sagittal and axial T1, axial PD, and axial and coronal T2 weighted acquisitions. The 32 item ordinal scale evaluated abnormalities within the basal ganglia and posterior fossa, blind to diagnosis. Among 760 patients in the study population (PSP = 362, MSA = 398), 627 had per protocol images (PSP = 297, MSA = 330). Intra-rater (n = 60) and inter-rater (n = 555) reliability were assessed through Cohen's statistic, and scale structure through principal component analysis (PCA) (n = 441). Internal consistency and reliability were checked. Discriminant and predictive validity of extracted factors and total scores were tested for disease severity as per clinical diagnosis. RESULTS Intra-rater and inter-rater reliability were acceptable for 25 (78%) of the items scored (≥ 0.41). PCA revealed four meaningful clusters of covarying parameters (factor (F) F1: brainstem and cerebellum; F2: midbrain; F3: putamen; F4: other basal ganglia) with good to excellent internal consistency (Cronbach α 0.75-0.93) and moderate to excellent reliability (intraclass coefficient: F1: 0.92; F2: 0.79; F3: 0.71; F4: 0.49). The total score significantly discriminated for disease severity or diagnosis; factorial scores differentially discriminated for disease severity according to diagnosis (PSP: F1-F2; MSA: F2-F3). The total score was significantly related to survival in PSP (p<0.0007) or MSA (p<0.0005), indicating good predictive validity. CONCLUSIONS The scale is suitable for use in the context of multicentre studies and can reliably and consistently measure MRI abnormalities in PSP and MSA. Clinical Trial Registration Number The study protocol was filed in the open clinical trial registry (http://www.clinicaltrials.gov) with ID No NCT00211224

    An annotated guide to Philippine serials

    Full text link
    Data Paper: Number 101. Total number of pages: 131 p

    An annotated guide to Philippine serials

    No full text
    +131hlm.;27c

    Lambda: the local aligner for massive biological data

    No full text
    MOTIVATION:Next-generation sequencing technologies produce unprecedented amounts of data, leading to completely new research fields. One of these is metagenomics, the study of large-size DNA samples containing a multitude of diverse organisms. A key problem in metagenomics is to functionally and taxonomically classify the sequenced DNA, to which end the well-known BLAST program is usually used. But BLAST has dramatic resource requirements at metagenomic scales of data, imposing a high financial or technical burden on the researcher. Multiple attempts have been made to overcome these limitations and present a viable alternative to BLAST.RESULTS:In this work we present Lambda, our own alternative for BLAST in the context of sequence classification. In our tests, Lambda often outperforms the best tools at reproducing BLAST's results and is the fastest compared with the current state of the art at comparable levels of sensitivity. AVAILABILITY AND IMPLEMENTATION:Lambda was implemented in the SeqAn open-source C++ library for sequence analysis and is publicly available for download at http://www.seqan.de/projects/lambda. CONTACT:[email protected] SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online

    EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices

    No full text
    The unidirectional FM index was introduced by Ferragina and Manzini in 2000 and allows to search a pattern in the index in one direction. The bidirectional FM index (2FM) was introduced by Lam et al. in 2009. It allows to search for a pattern by extending an infix of the pattern arbitrarily to the left or right. If σ is the size of the alphabet then the method of Lam et al. can conduct one step in time O(σ) while needing space O(σ⋅n) using constant time rank queries on bit vectors. Schnattinger and colleagues improved this time to O(logσ) while using O(logσ⋅n) bits of space for both, the FM and 2FM index. This is achieved by the use of binary wavelet trees. In this paper we introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in O(1) time per step while using O(logσ⋅n)+o(logσ⋅σ⋅n) bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between ≈2.2−4.2 times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices
    corecore